Automatic Terminology Intelligibility Estimation for Readership-oriented Technical Writing
نویسندگان
چکیده
This paper describes automatic terminology intelligibility estimation for readership-oriented technical writing. We assume that the term frequency weighted by the types of documents can be an indicator of the term intelligibility for a certain readership. From this standpoint, we analyzed the relationship between the following: average intelligibility levels of 46 technical terms that were rated by about 120 laymen; numbers of documents that an Internet search engine retrieves using each term as a keyword from various types of websites (i.e. term frequencies). The result of the analysis shows that term intelligibility for a target readership can be estimated by regression analysis of the term frequencies weighed by the type of website. As pilot studies, we developed two regression models for estimating the technical term intelligibility for the target readership. One uses the machine learning method based on ν-SVR, and the other uses multiple regression. In order to evaluate the models, we used the results of a survey on laymen’s intelligibility levels for 50 new technical terms, and then compared the survey results with our estimated results. The results gave a correlation coefficient of 0.66 between the survey results and estimated results.
منابع مشابه
Spectral Features for Automatic Blind Intelligibility Estimation of Spastic Dysarthric Speech
In this paper, we explore the use of the standard ITU-T P.563 speech quality estimation algorithm for automatic assessment of dysarthric speech intelligibility. A linear mapping consisting of three salient P.563 internal features is proposed and shown to accurately estimate spastic dysarthric speech intelligibility. Delta-energy features are further proposed in order to characterize the atypica...
متن کاملAutomatic speech recognition for assistive writing in speech supplemented word prediction
This paper describes a system for assistive writing, the Speech Supplemented Word Prediction Program (SSWPP). This system uses the first letter of a word typed by the user as well as the user’s (possibly low-intelligibility) speech to predict the intended word. The ASR system, which is the focus of this paper, is a speaker-dependent isolated-word recognition system. Word-level results from a no...
متن کاملRelating automatic vowel space estimates to talker intelligibility
Differences in pronunciation have been shown to underlie significant talker-dependent intelligibility differences. There are several dimensions of variability that are correlated with talker intelligibility including pitch range, vowel-space expansion, and rhythmic patterns. Prior work has shown that some of the better predictors of individual intelligibility are based on the talker’s F1 by F2 ...
متن کاملDEFINDER: Rule-based Methods for the Extraction of Medical Terminology and their Associated Definitions from On-line Text
INTRODUCTION The problem addressed in this paper concerns the automatic identification and extraction of medical terms along with their definitions and modifiers from full text consumer-oriented medical articles. The system, DEFINDER (Definition Finder), uses rule-based techniques. The output of our system can be used in several applications: creation and/or enhancement of on-line terminologica...
متن کاملObjective Estimation of Dysarthric Speech Intelligibility
The de-facto standard for dysarthric intelligibility assessment is a subjective intelligibility test, performed by an expert. Subjective tests are often costly, biased and inconsistent because of their perceptual nature. Automatic objective assessment methods, in contrast, are repeatable and relatively cheap. Objective methods can be broken down into two subcategories: reference-free, and refer...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006